AD-A246  S2S 


STOCHASTIC  VERSIONS  OF  THE  EM  ALGORITHM 


by 


Jean-CSaude  Biscarat 
Gilles  Cdeux 
Jean  Diebolt 


TECHNICAL  REPORT  No.  227 
January  1992 


Department  of  Statistics,  GN>22 
University  of  Washington 
Seattle,  Washington  98195  USA 


Z  19  105 


STOCHASTIC  VERSIONS  OF  THE  EM  ALGORITHM 

Jean-Claude  BISCARAT*,  GiUes  CELEUX**  and  Jean  DffiBOLT*i 

*  LSTA,  45-55,  Universite  Paris  VI,  4  Place  Jussieu  75252  Paris  Cedex,  France 
**  INRIA  Rocquencourt  78150  Le  Chesnay,  France 


Abstract:  We  compare  three  different  stochastic  versions  of  the  EM  algorithm:  the  SEM  algorithm,  the 
SAEM  algorithm  and  the  MCEM  algorithm.  We  suggest  that  the  most  relevant  contribution  of  the 
MCEM  methodology  is  what  we  call  the  simulated  annealing  MCEM  algorithm,  which  turns  out  to  be 
very  close  to  SAEM.  We  focus  particularly  on  the  mixture  of  distributions  problem.  In  this  context,  we 
review  the  available  theoretical  results  on  the  convergence  of  these  algorithms  and  on  the  behavior  of 
SEM  as  the  sample  size  tends  to  infinity.  Finally,  we  illustrate  these  results  with  some  Monte-Carlo 
numerical  simulations. 
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1.  Introduction 

The  EM  algorithm  (Dempster,  Laird  and  Rubin  1977)  is  a  popular  and  often 
efficient  approach  to  maximum  likelihood  (ML)  estimation  or  for  locating  the  posterior 
mode  of  a  distribution  (Tanner  and  Wong  1987,  Green  1990  and  Wei  and  Tanner  1990) 
for  incomplete  data.  However,  despite  appealing  features,  the  EM  algorithm  has  several 
well-documented  limitations:  Its  limiting  position  can  strongly  depend  on  its  starting 
position,  its  rate  of  convergence  can  be  painfully  slow  and  it  can  provide  a  saddle  point  of 
the  likelihood  function  (l.f.)  rather  than  a  local  maximum.  Moreover,  in  certain  situations, 
the  maximization  step  of  EM  is  untractable. 

Several  authors  have  proposed  various  nonstochastic  improvements  on  the  EM 
algorithm  (e.g.,  Louis  1982,  Meilijson  1989,  Nychka  1990,  Silverman,  Jones,  Wilson 
and  Nychka  1990,  Green  1990).  However,  none  of  these  improvements  resulted  in  a 
completely  satisfactory  version  of  EM.  The  basic  motivation  of  each  of  the  three 
stochastic  versions  of  the  EM  algorithm  that  we  study  in  the  present  paper  is  to  overcome 
the  above-mentioned  limitations  of  EM.  These  stochastic  versions  of  EM  are  the  SEM 
algorithm  (Broniatowski,  Celeux  and  Diebolt,  1983  and  Celeux  and  Diebolt,  1985),  the 
SAEM  algorithm  (Celeux  and  Diebolt,  1989)  and  the  MCEM  algorithm  (Wei  and  Tanner, 
1990  and  Tanner,  1991).  The  purpose  of  the  present  paper  is  to  compare  the 
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characteristics  of  these  stochastic  versions  of  EM  and  to  focus  on  the  relationships 
between  MCEM  and  the  two  other  algorithms. 

The  motivations  of  the  introduction  of  a  simulation  step  making  use  of 
pseudorandom  draws  at  each  iteration  are  not  the  same  for  SEM  and  MCEM  (SAEM  is 
but  a  variant  of  SEM).  On  one  hand,  the  simulation  step  of  SEM  relies  on  the  Stochastic 
Imputation  Principle  (SEP):  Generate  pseudo-completed  samples  by  drawing  potential 
unobserved  samples  from  their  conditional  density  given  the  observed  data,  for  the 
current  fit  of  the  parameter.  On  the  other  hand,  MCEM  replaces  analytic  computation  of 
the  conditional  expectation  of  the  log-likelihood  of  the  complete  data  given  the 
observations  by  a  Monte-Carlo  approximation.  However,  despite  different  motivations, 
both  SEM-S  AEM  and  MCEM  can  be  considered  as  random  perturbations  of  the  discrete¬ 
time  dynamical  system  generated  by  EM.  This  is  the  reason  of  their  successful  behavior: 
First,  the  random  perturbations  prevent  these  algorithms  from  staying  near  the  unstable  or 
hyperbolic  fixed  points  of  EM,  as  well  as  from  its  nonsignificant  stable  fixed  points. 
Moreover,  the  underlying  EM  dynamics  helps  them  finding  good  estimates  of  the 
parameter  in  a  comparatively  small  number  of  iterations.  Finally,  the  statistical 
considerations  directing  the  simulation  step  of  these  algorithms  lead  to  a  proper  data- 
driven  scaling  of  the  random  perturbations.  In  Section  2,  we  present  each  of  these  three 
algorithms  in  the  same  general  setting,  the  ideas  which  underlie  them  and  their  key 
properties.  In  Section  3,  we  show  how  these  algorithms  apply  to  the  mixture  problem. 
Given  some  reference  measure,  a  density  f(y)  is  a  finite  mixture  of  densities  from  some 

parametrized  family  -d  =  {(p(x,a):  a  e  A}  if  f(y) 

K 

f(y)  =  I  pk  9(y.  ak),  (1.1) 

k=l 

for  some  finite  integer  K,  where  the  weights  pk  are  in  (0,  1)  and  sum  to  one.  The  mixture 
problem  consists  in  identifying  the  weighting  parameters  pi,  ...,  Pk  and  the  parameters 
ai,  ...,  aK  of  the  component  densities,  on  the  basis  of  a  sample  of  i.i.d.  observations  yi, 
...,  yN  issued  from  (1.1).  The  mixture  problem  and  its  variants  and  extensions  is  among 
the  most  relevant  areas  of  application  of  the  EM  methodology  (Redner  and  Walker  1984, 
Titterington,  Smith  and  Makov  1985).  In  the  same  section,  the  main  results  concerning 
the  convergence  properties  of  the  algorithms  EM,  SEM,  SAEM  and  MCEM  in  the 
mixture  context  are  reviewed.  Moreover,  a  theorem  about  the  asymptotic  behavior  of 
SEM  in  a  particular  mixture  setting  as  the  sample  size  goes  to  infinity  is  stated.  Also,  a 
theorem  about  the  almost  sure  (a.s.)  convergence  of  a  simulated  annealing  type  version  of 
the  MCEM  algorithm  (abbreviated  s.a.  MCEM  hereafter)  is  stated.  None  of  the  proofs  of 
these  various  results  is  given  in  this  paper,  since  each  of  them  is  very  technical  and  the 
purpose  is  rather  to  provide  a  brief  comparative  study.  However,  detailed  references  are 
given  for  interested  readers.  Some  illustrative  numerical  Monte-Carlo  experiments  are 
presented  in  Section  4. 
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2.  The  EM  algorithm  and  its  stochastic  versions 

2.1.  The  EM  algorithm.  The  EM  algorithm  (Dempster  et  al.  1977)  is  an  iterative 
procedure  designed  to  find  ML  estimates  in  the  context  of  parametric  models  where  the 
observed  data  can  be  viewed  as  incomplete.  In  this  subsection,  we  briefly  review  the 
main  features  of  EM.  The  observed  data  y  are  supposed  to  be  issued  from  the  density 
g(y|0)  with  respect  to  some  a-finite  measure  that  we  denote  dy.  Our  objective  is  to 
estimate  0  by  0  =  arg  max  L(0),  where  L(0)  =  log  g(y|6).  The  basic  idea  of  EM  is  to 
take  advantage  of  the  usual  expressibility  in  a  closed  form  of  the  ML  estimate  of  the 
complete  data  x  =  (y,  z).  Here,  z  denotes  the  unobserved  (or  latent)  data.  The  EM 
algorithm  replaces  the  maximization  of  the  unknown  l.f.  f(x|0)  of  the  complete  data  x  by 
successive  maximizations  of  the  conditional  expectation  Q(0',  0)  of  log  f(x|0')  given  y 
for  the  current  fit  of  the  parameter  0.  More  formally,  let  k(z  1  y,  0)  =  f(x  1 0)/g(y  1 0) 
denote  the  conditional  density  of  z  given  y  with  respect  to  some  o-finite  measure  dz. 
Then 


Q(0',0)=  k(z  I  y,  0)  log  f(x  1  0’)  dz,  (2.1) 

where  Z  denotes  the  sample  space  of  the  latent  data  z.  Given  the  current  approximation  0*^ 
to  the  ML  estimate  of  the  observed  data,  the  EM  iteration  0'’+^  =  Tn(0O  involves  two 
steps:  The  E  step  computes  Q  (0,  00  and  the  M  step  detei mines  0*’+l  =  arg  maxe  Q  (6, 
00-  This  updating  process  is  repeated  until  convergence  is  apparent. 

The  EM  algorithm  has  the  basic  property  that  each  iteration  increases  the  l.f.,  i.e. 
L(0‘’+O  ^  L(0O  with  equality  iff  Q(0^+^  00  =  Q  (00  00-  A  detailed  account  of 
convergence  properties  of  the  sequence  {0*’}  generated  by  EM  can  be  found  in  Dempster 
et  al.  (1977)  and  Wu  (1983).  Under  suitable  regularity  conditions,  {0'’}  converges  to  a 
stationary  point  of  L(0).  But  when  there  are  several  stationary  points  (local  maxima, 
minima,  saddle  points),  it  can  occur  that  {0*’}  does  not  converge  to  a  significant  local 
maximum  of  L(0). 

In  practical  implementations,  the  EM  algorithm  has  been  observed  to  be  extremely 
slow  in  some  (important)  applications.  As  noted  in  Dempster  et  al.  (1977),  the 
convergence  rate  of  EM  (at  least  when  the  initial  position  is  not  too  far  from  the  true  value 
of  the  parameter)  is  linear  and  governed  by  the  fraction  of  missing  information.  Thus, 
slow  convergence  generally  appears  when  the  proportion  of  missing  information  is  high. 
On  the  other  hand,  when  the  log-likelihood  surface  is  littered  with  saddle  points  and  sub- 
optimal  maxima,  the  limiting  position  of  EM  greatly  depends  on  its  initial  position. 


2.2.  The  SEM  algorithm.  The  SEM  algorithm  (Broniatowski,  Celeux  and  Diebolt 
1983,  and  Celeux  and  Diebolt  1985,  1987)  has  been  designed  to  answer  the  above- 
mentioned  limitations  of  EM.  The  basic  idea  underlying  SEM  is  to  replace  the 
computation  and  maxintization  of  Q  (0, 00  by  the  much  simpler  computation  of  k(z  1  y, 
00  and  simulation  of  an  unobserved  pseudosample  z*',  and  then  to  update  0’"  on  the  basis 
of  the  pseudocompleted  sample  x*"  =  (y,  zO.  Thus,  SEM  incorporates  a  stochastic  step  (S 
step)  between  the  E  and  M  steps.  This  S  step  is  directed  by  the  Stochastic  Imputation 
Principle:  Generate  a  completed  sample  x*"  =  (y,  z*")  by  drawing  z’’  from  the  conditional 
density  k(z  1  y,  00  given  the  observed  data  y,  for  the  current  fit  O'"  of  the  parameter. 
SEM  basically  tests  the  mutual  consistency  of  the  current  guess  of  the  parameter  and  of 
the  corresponding  pseudo-completed  samples.  Since  the  updated  estimate  0*^+^  is  the  ML 
estimate  computed  on  the  basis  of  x*",  an  analytic  expression  of  0’''*''  as  a  function  of  x*" 
can  be  derived  in  a  closed  form  in  all  relevant  situations. 

Remark  22.1.  The  random  drawings  prevent  the  sequence  {0’"}  generated  by  SEM  from 
converging  to  the  first  stationary  point  of  the  log-l.f.  it  encounters.  At  each  iteration,  there 
is  a  non-zero  probability  of  accepting  an  updated  estimate  0’’+^  with  lower  likelihood 
value  than  0’‘.  This  is  the  basic  reason  why  SEM  can  avoid  the  saddle  points  or  the 
nonsignificant  local  maxima  of  the  If. 

Remark  2.2.2.  The  random  sequence  {0''}  generated  by  SEM  does  not  converge 
pointwise.  It  turns  out  that  this  sequence  is  a  homogeneous  Markov  chain,  which  is 
irreducible  whenever  k(x  I  y,  0)  is  positive  for  almost  every  0  and  x.  This  condition  is 
satisfied  in  most  contexts  where  SEM  can  be  applied.  If  {0’’}  turns  out  to  be  ergodic, 
then  it  converges  to  the  unique  stationary  probability  distribution  y  of  this  Markov  chain. 
Indeed,  this  is  a  situation  very  similar  to  that  prevailing  in  the  Bayesian  approach,  except 
that  y  cannot  be  viewed  as  a  posterior  probability  resulting  from  Bayes  formula.  Like  in 
the  Bayesian  perspective,  all  the  information  for  inference  on  0  is  contained  in  the 
probability  distribution  \|/:  The  empirical  mean  of  \j/  provides  a  point  estimate  of  0  and  the 
empirical  variance  matrix  of  y  provides  information  on  the  accuracy  of  this  point 
estimate.  Since  the  simulation  step  of  0  in  any  Bayesian  sampling  algorithm  (see  Remark 
2.2.4  below)  is  replaced  in  SEM  by  a  deterministic  ML  step,  the  variance  matrix  of  y  is 
smaller  than  the  inverse  of  the  observed-data  Fisher  information  matrix. 

Remark  2.2.3.  In  some  situations,  the  M  step  of  the  EM  algorithm  is  not  analytically 
tractable  (e.g.  censored  data  from  Weibull  distributions,  see  Chauveau  1990).  Since  SEM 
maximizes  the  log-l.f.  of  pseudocompleted,  it  does  not  involve  such  difficulties. 
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Remark  2.2.4.  In  a  Bayesian  perspective,  the  tractability  of  the  complete  data  likelihood 
f(y,  z  1 0)  is  viewed  as  that  of  the  posterior  density  of  6  given  the  complete  data. 


7c(0|y,  z)  = 


f(y,  z|0)  7t(0) 

J  f(y,z|0’)7t(0')d0'  ’ 
© 


(2.2) 


where  7t(0)  is  the  prior  density  on  0.  In  this  context,  it  is  natural  to  replace  the  M  step  of 
SEM  by  a  step  of  simulation  of  0  from  7t(01y,  z).  This  is  actually  the  essence  of  the  Data 
Augmentation  algorithm  of  Tanner  and  Wong  (1987),  which  can  thus  be  considered  as 
the  Bayesian  version  of  SEM.  Alternatively,  SEM  can  be  recovered  from  the  Data 
Augmentation  algorithm  by  taking  a  suitable  noninformative  prior  7i(0)  and  replacing  the 
simulation  step  of  0  from  7t(01y,  z)  by  an  imputation  step  where  0  is  updated  as 
l07i(0|y,  z)d0.  See  Diebolt  and  Robert  (1991)  for  more  details  in  the  mixture  context. 

Remark  22.5.  Since  SEM  can  be  seen  as  a  stochastic  perturbation  of  the  EM  algorithm, 
it  is  still  directed  by  the  EM  dynamics.  Thus,  SEM  can  find  the  most  stable  fixed  points 
of  EM  in  a  comparatively  small  number  of  iterations.  The  most  stable  a  fixed  point  0f  of 
EM  is,  the  longest  the  mean  sojourn  time  of  SEM  in  small  neighborhoods  of  0f  is.  Since 
the  stability  of  a  fixed  point  0f  of  EM  is  linked  to  the  matrix  (I  -  Jobs)  (6f)»  where  I  is 
the  identity  matrices,  and  Jc(0f)  and  Jobs(0f)  the  complete  and  observed  Fisher 
information  matrices,  respectively  (see,  e.g.,  Dempster  et  al.,  1977,  and  Redner  and 
Walker,  1984),  the  stationary  distribution  y  of  the  SEM  sequence  concentrates  around 
the  stable  fixed  point  of  EM  for  which  the  information  available  without  knowing  the 
missing  data,  Jobs- is  the  largest.  This  is  in  accordance  with  the  approach  of 
Windham  and  Cutler  (1991),  who  base  their  estimate  of  the  number  of  mixture 
components  on  the  smallest  eigenvalue  of  1^’*  Jobs-  Also,  the  SIP  provides  a  satisfactory 
data-driven  magnitude  for  the  random  perturbations  of  SEM.  When  the  sample  of 
observed  data  is  small  and  contains  few  information  about  the  true  value  of  the  parameter 
0,  the  variance  of  these  random  perturbations  becomes  large.  This  is  natural  since  in  such 
a  case  no  guess  0’’  of  0  is  very  likely,  so  that  the  updated  0’''*^*  arising  from  the  pseudo- 
completed  sample  x*”  =  (y,  zO  generated  from  k(z  |  y,  0*')  is  comparatively  far  from  0^ 
with  high  probability  and  the  variance  of  the  stationary  distribution  Xj/  of  SEM  is  large. 
Such  an  erratic  behavior  makes  SEM  difficult  to  handle  for  small  sample  sizes.  This  is  the 
reason  why  we  have  introduced  the  SAEM  algorithm,  described  in  the  next  Subsection. 


2.3.  The  SAEM  algorithm.  The  SAEM  algorithm  (Celeux  and  Diebolt  1989)  is  a 
modification  of  the  SEM  algorithm  such  that  convergence  in  distribution  can  be  replaced 
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by  a.s.  convergence  and  the  possible  erratic  behavior  of  SEM  for  small  data  sets  can  be 
attenuated  without  sacrificing  the  stochastic  nature  of  the  algorithm.  This  is  accomplished 
by  making  use  of  a  sequence  of  positive  real  numbers  Yr  decreasing  to  zero,  which 
parallel  the  temperatures  in  Simulated  Annealing  (see,  e.g.,  van  Laarhoven  and  Arts 
1987).  More  precisely,  if  S'"  is  the  current  fit  of  the  parameter  via  SAEM,  the  updated 
approximation  to  0  is 

=  (1  -  Yr+l)  e  pt!  +  7r+l  ,  (2.3) 

EM  SEM 

r+1  r+1 

where  0  ^  (resp.  0  ^  ^  )  is  the  updated  approximation  of  0  via  EM  (resp.  SEM). 

Remark  2.3.1.  SAEM  is  going  from  pure  SEM  at  the  beginning  towards  pure  EM  at  the 
end.  The  choice  of  the  rate  of  convergence  to  0  of  Yr  is  very  important.  Typically,  a  slow 
rate  of  convergence  is  necessary  for  good  performance.  From  a  practical  point  of  view,  it 
is  important  that  Yr  stays  near  YO  =  1  during  the  first  iterations  to  let  the  algorithm  avoid 
suboptimal  stationary  values  of  L(0).  From  a  theoretical  point  of  view,  we  will  see  in 
Section  3  that,  in  the  mixture  context,  we  essentially  need  the  assumptions  that  limr_4oo 
(Yr^Yr+i)  =  1  and  Xr  Yr  =  *”  to  ensure  the  a.s.  convergence  of  SAEM  to  a  local  maximizer 
of  the  log-l.f.  L(0)  whatever  the  starting  point. 

2.4.  The  MCEM  algorithm.  The  MCEM  algorithm  (Wei  and  Tanner  1990)  proposes 
a  Monte-Carlo  implementation  of  the  E  step.  It  replaces  the  computation  of  Q(0,  0’’)  by 
that  of  an  empirical  version  Qr+i  (0,  0'’),  based  on  m  (m  »  1)  drawings  of  z  from 
k(z  iy,  0'')-  More  precisely,  the  r  th  step:  (a)  Generates  an  i.i.d.  sample  z''(l),...,  z''(m) 
from  k(z  1  y,  0'’)  and  (b)  Updates  the  current  approximation  to  Q(0,  0’')  as 

9 

1  n*  I 

Qr+i(0,  00  =  —  I  log  f(y,  z^Cj))  1 0  ).  (2.4) 

j=’ 


(c)  Then,  the  M  step  provides  0^"^^  =  arg  maxe  Qr+i(0,  0’’). 

Remark  2.4.7.  If  m  =  1,  MCEM  reduces  to  SEM. 

Remark  2.4.2.  If  m  is  large,  MCEM  works  approximatively  like  EM;  thus  it  has  the  same 
drawbacks  as  EM.  Moreover,  if  m  >  I,  maximizing  Qr+i(0,  0’’)  can  turn  out  to  be  nearly 
as  difficult  as  maximizing  Q(0,  OO- 


6 


Remark  2.43.  Wei  and  Tanner  motivated  the  introduction  of  the  MCEM  algortihm  as  an 
alternative  which  replaces  analytic  computation  of  the  integral  in  (2.1)  by  numerical 
computation  of  a  Monte-Carlo  approximation  to  this  integral.  On  the  contrary,  SEM  does 
not  involve  exact  or  approximate  computation  of  Q(0, 60:  the  only  computation  involved 
in  its  E  step  is  that  of  the  conditional  density  k(z  |  y.  S'").  Moreover,  its  M  step  is 
generally  straightforward,  since  it  consists  in  maximizing  the  likelihood  f(y,  1 0)  of  the 
completed  sample  (y,  zO-  Thus,  in  all  situations  where  SEM  works  well,  it  should  be 
preferred  to  MCEM. 

Remark  2.4.4.  The  discussion  in  Remark  2.4.3  points  out  that  numerical  integration  of 
(2.1)  is  not  the  real  interest  of  MCEM.  From  the  comments  of  Wei  and  Tanner  (1990) 
and  Tanner  (1991)  about  the  specification  of  m,  it  turns  out  that  the  real  interest  of 
MCEM  is  its  simulated  annealing  type  version,  in  the  spirit  of  the  SAEM  algorithm. 
Indeed,  Wei  and  Tanner  recommend  to  start  with  small  values  of  m  and  then  to  increase 
m  as  6’’  moves  closer  to  the  true  maximizer  of  L(6).  More  precisely,  if  we  select  a 
sequence  {mr}  of  integers  such  that  mo  =  1  and  mr  increases  to  infinity  as  r— at  a 
suitable  rate  and  perform  the  rth  iteration  with  m  =  mr,  then  we  go  from  pure  SEM  (mo  = 
1)  to  pure  EM  (m  =  oo)  as  r— >00.  Since  the  variance  of  the  random  perturbation  term  then 
decreases  to  zero,  the  resulting  MCEM  version  can  be  viewed  as  a  particular  type  of 
simulated  annealing  method  with  1/mr  playing  the  role  of  the  temperature.  For  brevity, 
we  call  this  algorithm  the  simulated  annealing  MCEM  algorithm  (s.a.  MCEM)  throughout 
this  paper.  Note  that  the  s.a.  MCEM  can  still  be  used  when  no  tractable  expression  of 
Q(0,  0'’)  can  be  derived,  in  contrast  with  SAEM. 

Remark  2.4  J.  Wei  and  Tanner  established  no  convergenc**.  result  for  MCEM  or  its 
simulated  annealing  version.  In  Section  3,  we  state  a  theorem  which  ensures  the  a.s. 
convergence  of  the  simulated  annealing  MCEM  to  a  local  maximizer  of  L(0)  for  suitable 
sequences  {mr},  under  reasonable  assumptions,  in  the  mixture  context.  This  result 
shows  the  interest  of  this  version  of  MCEM.  It  is  proved  in  Biscarat  (1991)  and  has  been 
derived  from  previous  results  (Celeux  and  Diebolt  1991a  and  Biscarat  1991)  about  the 
convergence  of  SAEM. 

3.  A  basic  example:  the  mixture  case 

3.1.  The  incomplete  data  structure  of  mixture  data.  We  now  focus  on  the 
mixture  of  distributions  problem.  It  is  one  of  the  areas  where  the  EM  methodology  has 
found  its  most  significant  contributions.  Many  authors  have  studied  the  behavior  of  EM 
in  this  context  from  both  a  practical  and  a  theoretical  point  of  view:  e.g.,  Redner  and 
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Walker  (1984),  Titterington,  Smith  and  Makov  (1985),  Celeux  and  Dieboli  (1985), 
McLachlan  and  Basford  (1989)  and  Titterington  (1990).  For  simplicity,  we  will  restrict 
ourselves  to  mixtures  of  densities  from  the  same  exponential  family  (see,  e.g.,  Redner 
and  Walker  1984  and  Celeux  and  Diebolt  1991a). 

The  observed  i.i.d.  sample  y  =  (yi,...,  yN)  is  assumed  to  be  drawn  from  the 
mixture  density 

K 

f(y)  =  S  Pk  q>(y.  ak),  (3.1) 

k=l 


where  y  e  R4,  the  mixing  weights  pk,  0  <  pk  <  1,  sum  to  one  and 

9(y,  a)  =  D(a)-l  n(y)  exp{aT  b(y)},  (3.2) 

where  a  is  a  vector  parameter  of  dimension  s,  a^  denotes  the  transpose  of  a,  n  ;  >  R 

and  b  :  R^  — >  Rs  are  sufficiently  smooth  functions  and  D(a)  is  a  normalizing  factor.  The 
parameter  to  be  estimated  6  =  (pi, ...,  pK-i»  ai, ...,  aK)  lies  in  some  subset  ©  of  R*^' 
l+sK 

In  this  context,  the  complete  data  can  be  written  x  =  (y,  z)  =  {(y,,  zj),  i  =  1, ..., 
N),  where  each  vector  of  indicator  variables  zj  =  (zy,  j  =  1, K)  is  defined  by  Zij  =  1 
or  0  depending  on  whether  the  ith  observation  yi  has  been  drawn  from  the  jih  component 
density  <p(y,  aj)  or  from  another  one.  Owing  to  independence,  k(zly,  0)  can  be  split  into 
the  product  fl,  k(zi|yi,  0)  where  the  probability  vector  k(z|y,0)  is  defined  by 


k(z|y,  0)  = 


p(z)  9(y,  a(z)) 
K 

I  Ph  <P(y,  ah) 

h=l 


(3.3) 


with  p(z)  =  pj  and  a(z)  =  aj  iff  z  =  (0,  1,  ...,  0),  1  being  in  the  jlh  position.  The 

posterior  probability  that  y,  has  been  drawn  from  the ^th  component  is 

=  (3.4) 

I  Ph  <P(yi,  ah) 

h=l 

The  log-l.f.  takes  the  form  L(0)  =  Li  log{I^  Pj  (p(yi,  aj)}  and  (Titterington  et  al.  1985) 

N  K 

Q(e',  9)  =  I  I  tj(yi,  0)  {log  Pj  +  log(p(yi,  a’j)}.  (3.5) 

1=1  j=i 
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3.2.  EM.  The  E  step  of  the  EM  algorithm  computes  the  posterior  probabilities  t[j  =  tj(yi, 

6^),  i=l,...,N  and  j=l,...,K,  according  to  (3.3)  and(3.4)  and  the  M  step  provides  the 
updating  formulas 


N  r 

I  t  j,  j  = 

i=l 


(3.6) 


and 


N  r 

I  t^b(yi) 

i=l  J 

r+1  _ 

~  N 

i=l  •* 


j  =  1,  ...,  K. 


(3.7) 


3.3.  SEM.  The  E  step  of  SEM  is  the  same  as  above.The  S  step  independently  draws 
each  Zj,  i  =  1, ...,  N,  from  a  multinomial  distribution  with  parameters  {tjj,  j=I,  ...,  K}. 

If 


1  N 


Z;.  >  c(N)  for  all  j=l,  ...,  K, 


(3.8) 


then  go  to  the  M  step  below.  Here,  c(N)  is  a  threshold  satisfying  0  <  c(N)  <  1  and  c(N) 
->  0  as  N  The  role  of  condition  (3.8)  is  to  avoid  numerical  singularities  in  the  M 
step.  Typically,  we  chose  c(N)  =  (d+l)/N  (Celeux  and  Diebolt  1985).  If  (3.8)  is  not 
satisfied,  then  the  new  's  are  drawn  from  some  preassigned  distribution  on  Z  such  that 

(3.8)  holds  and  then  go  to  the  M  step.  The  M  step  provides 


,  I  1  N 

Pj  .S  Zjj  -  J=l. K 


and 


N 


i  N 


I  z  H  b(yi) 


j=l,...,  K. 


r 

I 


i=l 


(3.9) 


(3.10) 
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3.4.  SAEM.  The  formulas  for  the  SAEM  algorithm  can  be  directly  derived  from  the 
above  descriptions  of  EM  and  SEM  and  from  (2.3).  They  are  not  detailed  here. 

3.5.  MCEM.  We  now  turn  to  the  description  of  the  MCEM  algorithm  in  the  mixture 
case.  Again,  the  E  step  is  as  in  Subsection  3.2.  The  Monte-Carlo  step  generates  m 
independent  samples  of  indicator  variables  zTh)=  (Zj(h),  ...,  2j^(h))  (h  =  1,  ...,  m)  from 

the  conditional  distributions  {ty,  j=l,  ...,  K}  (i  =  1,  ...,  N).  Thus,  from  the  definition  of 
MCEM,  the  updated  approximation  *  maximizes 

Qr+l(e,  eo  =  ^  I  S  {log  p(z;(h))  +  log  (p(yi,  a(z;(h)))}  (3.11) 

"^h=l  i=l 

where  p(2|(h))  =  pj  and  a(2{(h))  =  aj  iff  Zy.(h)  =  1.  Equality  (3.1 1)  can  also  be  written 
N  K 

Qr+i(0,  00  =  S  I  Uj.  {log  Pj  +  log  (p(yi,  aj)}, 
i=l  j=l  ■' 

where 

^  #  (h:  h  =  1,  m,  zO(h)  =  1} 

~  m 

represents  the  frequency  of  assignment  of  yi  to  the  j\h  mixture  component,  at  the  rth 
iteration,  along  the  m  drawings.  Comparing  (3.5)  and  (3.12),  it  appears  that  MCEM  just 
replaces  the  posterior  probabilities  ty  by  the  frequencies  Uy  in  the  formulas  (3.6)  and 

(3.7)  resulting  from  the  M  step  of  the  EM  algorithm. 

Remark  3.5.1.  As  for  SEM,  the  random  drawings  which  lead  to  the  z^(h)'s  are  started 
afresh  from  a  suitable  distribution  on  Z  if  condition  (3.8)  is  not  satisfied. 

Remark  3.5.2.  As  noticed  above,  starting  with  m=l  and  increasing  m  to  infinity  as  the 
iteration  index  grows  will  produce  the  s.a.  MCEM  algorithm,  quite  analogous  to  SAEM 
(see  Section  3.4).  Tf  the  s.a.  MCEM  is,  in  some  sense,  more  elegant  and  natural  than 
SAEM,  it  is  dramatically  more  time  consuming  than  SAEM,  since  it  involves  more  and 
more  random  drawings. 

3.6.  Convergence  properties.  This  subsection  lists  the  main  results  concerning  the 

convergence  properties  of  the  algorithms  examined  in  this  paper,  in  the  particular  context 
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(3.12) 


(3.13) 


of  mixtures  from  some  exponential  family,  which  is  the  area  of  application  of  EM  and  its 
various  versions  where  the  most  precise  results  are  available.  Moreover,  a  result 
concerning  the  asymptotic  behavior  of  the  stationary  distribution  of  3EM  as  the  sample 
size  N  — >  oo  is  stated.  The  proofs  of  these  results  are  not  given  in  this  paper,  since  they 
are  very  technical. 

A.  Concerning  EM,  Redner  and  Walker  (1984)  have  proved  the  following  local 
convergence  result. 

Theorem  1  (Redner  and  Walker  1984).  -  In  the  context  of  mixtures  from  some 
exponential  family,  if  the  Fisher  information  matrix  evaluated  at  the  true  0  is  positive  and 
the  mixture  proportions  are  positive,  then  with  probability  fforN  sufficiently  large,  the 
unique  strongly  consistent  solution  0n  of  the  likelihood  equations  is  well  defined  and  the 
sequence  {0’’}  generated  by  EM  converges  linearly  to  0n  whenever  the  starting  point  0*^ 
is  sufficiently  near  0n. 

B.  Concerning  SEM,  Celeux  and  Diebolt  (1986,  1991b)  have  proved  the  ergodicity  of 
the  sequence  {0^^}  generated  by  SEM  in  the  mixture  context.  The  proof  reduces  to 
showing  that  the  sequence  {z‘‘}  is  a  finite-state  homogeneous  irreducible  and  aperiodic 
Markov  chain.  This  result  guarantees  weak  convergence  of  the  distribution  of  0*’  to  the 
unique  stationary  distribution  of  the  ergodic  Markov  chain  generated  by  SEM.  (The 
index  N  indicates  dependence  on  the  sample.)  However,  such  a  result  does  not  guarantee 
that  Vfsf  is  concentrated  around  the  consistent  ML  estimator  0^  of  0  mentioned  in 
Theorem  1.  Celeux  and  Diebolt  (1986,  1991b)  have  examined  the  asymptotic  behavior  of 
H/n.  They  start  by  showing  that  the  SEM  sequence  satifies  a  recursive  relation  of  the  form 

0^+1  =TN(0r)  + VN(0^zr),  (3.14) 

where  Tn  denotes  the  EM  operator  and  :  R^x  Z  — ^  Rs  is  a  measurable  function  such 
that  VN  Vn(0'',  z^)  converges  in  distribution  as  N  — »  «>,  uniformly  in  0’’  6  Gn  (compact 
subset  of  0),  to  some  Gaussian  r.v.  with  mean  0  and  positive  variance  matrix.  In  the 
particular  case  of  a  two-component  mixture  where  the  the  mixing  proportion  p  is  the  only 
unknown  parameter,  they  have  established  the  following  result. 

Theorem  2  (Celeux  and  Diebolt  (1986,  1991b)).  -  Let  f(x  |  p)  =  p(pi(x)  +  (1  -  p)(p2(x),  0 
<  p  <  \  ,  be  the  density  with  respect  to  some  o  finite  measure  p.(dx)  of  a  two-component 
mixture  where  the  densities  9i  and  92  assumed  known.  Then,  for  a  suitable  rate  of 
convergence  to  0  of  the  threshold  c(N)  introduced  in  (3.8): 
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(i) .  For  N  large  enough,  the  EM  operator  Tf^{Tp),  0  <  p  <  1,  has  a  unique  fixed  point  pN  in 
the  interval  Gn  =  [c(N),  1  -  c(N)],  pN  is  the  unique  maximizer  of  the  If.  on  (0,  1)  and 
linifi^  PN  =  p. 

(ii) .  //Wn  denotes  a  r.v.  whose  distribution  is  the  stationary  distribution  \j/N  of  SEM, 
then  VN(Wn  -  pn)  converges  in  distribution  to  a  Gaussian  r.v.  with  mean  0  and  variance 
cf  =  p(l-p)u/(l-u2),  where 

0<„4 - (3.15, 

p(pi(x)  +  (1  -  p)(P2(x) 

Remark  3.6.1.  Theorem  2  suggests  the  conjecture  that  a  similar  behavior  should  hold  in 
the  general  mixture  context.  Celeux  and  Diebolt  (1986)  could  only  prove  such  a  result 
under  the  stringent  assumption  that  0n  is  the  unique  fixed  point  of  Tn  in  Gn  in  addition  to 
some  technical  assumptions. 

C.  Concerning  SAEM,  which  can  be  expressed  as 

er+u  Tn(0O  +  Yr  VN(e^  zO  (3.16) 

(see  (2.3)  and  (3.14)),  Celeux  and  Diebolt  (1991a)  have  established  the  following 
convergence  result 

Theorem  3  (Celeux  and  Diebolt  1991a)).  -  In  the  context  of  mixtures  from  some 
exponential  family,  assume  that  for  some  convex  compact  subset  Hn  of  ©,  the 
following  assumptions  (HI)  -  (H5)  hold. 

(HI)  The  set  of  fixed  points  o/Tn  contained  in  Hn  is  finite  and  thi  »-e  exists  at  least  a 
stable  fixed  point  o/Tn  in  Hn- 

(H2)  For  any  fixed  point  0*  in  Hn,  the  matrix  D2L(0*)  is  nonsingular. 

(H3)  There  exists  p  >  0  such  that  for  any  0  in  Hn,  the  ball  with  center  T(0)  and  radius 
p  is  contained  in  Hn. 

(H4)  For  any  0  in  Hn,  any  hyperplane  P  of  such  thatJ(Q)  e  P  and  any  half-space 
D  o/Rs  spanned  by  P,  the  set  of  those  points  of  the  form  Tn(0)  +  Vn(0,  z),  z  e 
Z,  which  are  in  D  is  non-empty. 

(HS)  The  sequence  (Yr)  of  positive  numbers  with  YO  =  1  decreases  to  zero  ar  r  -4  oo 
and  satiates  Yr  =  cr'l^  for  some  positive  constant  c  and  some  p,  0  <  p  <  1 . 

Then  the  sequence  {0^}  generated  by  SAEM  converges  a.  s.  to  a  local  maximizer  of  the  1. 
f,  whatever  its  starting  point. 
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Remark  3.6.2.  For  EM,  the  possibility  of  convergence  to  a  saddle  point  of  the  l.f.  is 
always  present.  On  the  contrary,  Theorem  3  ensures  that  SAEM  does  not  converge  to 
such  a  point  a.s.  The  basic  reason  why  SAEM  achieves  better  results  than  EM  is  that 
SAEM  does  not  necessarily  terminate  in  the  first  local  maximum  encountered. 

Remark  3.6.3.  The  assumption  (H4)  is  very  technical,  but  is  reasonable  for  N  large 
enough,  since  the  number  of  points  of  the  form  Tn(0)  +  Vn(6,  z),  z  e  Z,  is  equal  to  K^. 

D.  Concerning  the  s.a.  MCEM  algorithm,  which  can  be  expressed  as 

er+l  =  TnCGO  + 1=  (3.17) 

V  m(r) 

where  =  {z''(h),  h=l,...,  m(r)}  represents  the  vector  of  the  m(r)  samples  drawn  at 
iteration  r  and  U!!,  ;  >  R*  is  a  measurable  function,  Biscarat  (1991)  has 

established  the  following  convergence  result 

Theorem  4  (Biscarat  1991).  -  In  the  context  of  mixtures  from  some  exponential  family, 
assume  that  for  some  convex  compact  subset  Hn  of  ©,  the  above  assumptions  (HI)  - 
(H3)  hold  along  with  the  following  assumption  (H5  bis): 

(H5  bis).  There  exists  a  positive  constant  o  such  that  r®  =  o{m(r)}  (r-->oo). 

Then  the  sequence  {0’’}  generated  by  the  s.a.  MCEM  algorithm  converges  a.  s.  to  a  local 
maximizer  of  the  l.f,  whatever  its  starting  point. 

Remark  3.6.4.  Theorem  4  does  not  require  the  technical  assumption  (H4).  This  is 
essentially  due  to  the  fact  that  the  noise  is  asymptotically  Gaussian  as  r  — > «»  .  in  this 

perspective,  MCEM  can  be  thought  of  as  more  natural  than  SAEM. 

6.  Numerical  comparisons  of  EM,  SEM,  SAEM  and  MCEM 

In  this  section,  we  compare  the  practical  behavior  of  EM,  SEM,  SAEM  and 
MCEM  on  the  basis  of  a  Monte-Carlo  experiment  in  a  small  sample  context  for  a 
Gaussian  mixture  which  is  somewhat  difficult  to  identify.  For  each  sample  size  N  =  100 
and  N  =  60,  we  gene.ated  50  samples  from  an  univariate  four-component  Gaussian 
mixture  with  weights  pi  =  P2  =  P3  =  P4  =  0-25,  means  mi  =  2,  m2  =  5,  m3  =9  and  014 
=  15  and  variances  Oj  =  0.625,  Oj  =  0.25,  Oj  =  1  and  =  4.  For  each  generated 

sample,  we  performed  200  iterations  of  EM,  SEM  and  SAEM  using  two  different 
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initialization  schemes.  In  the  first  one,  we  started  from  the  true  parameter  values.  In  the 
second  one,  the  initial  positions  of  the  parameters  were  drawn  at  random  as  follows.  We 
first  drew  uniformly  four  points  Ci, ...,  C4  among  {yi, ...,  yN).  We  then  partitioned  the 
sample  into  four  clusters  by  aggregating  each  yi,  i  =  1, ...,  N,  around  the  nearest  of  C], 
...,  C4.  Finally,  we  computed  the  initial  parameters  on  the  basis  of  these  four  clusters. 

In  order  to  derive  in  a  simple  way,  from  the  SEM  scheme,  a  reliable  pointwise 
estimate  of  the  mixture  parameters,  we  used  a  hybrid  algorithm.  We  ran  200  SEM 
iterations.  Then  we  ran  10  additional  EM  iterations  starting  from  the  position  which 
achieved  the  largest  value  of  the  likelihood  function  among  these  200  SEM  iterations. 

The  choice  of  the  rate  of  convergence  to  0  of  the  sequence  {yr}  for  SAEM  is  very 
important  and  delicate.  From  our  experience,  the  following  cooling  schedule  turns  out  to 
give  good  results  :  Yr  =  cos  ra  for  0  <  r  <  20,  and  Yr  =  c/Vr  for  21  <  r  <  200,  where  cos 
(20  a)  =  c  /  V2^  =  0.3.  We  performed  the  s.a.  MCEM  algorithm  with  mr  =  [l/y^]  and  Yr 

as  above  to  have  the  same  cooling  rate  for  SAEM  and  s.a.  MCEM,  in  view  of  (3.16)  and 
(3. 17).  Moreover,  in  these  numerical  experiments,  we  did  not  take  into  account  the  trials 
for  which  one  of  the  mixing  weights  pj  (j  =  1,  4;  r  =  1,  ...,  200)  became  smaller  than 

2/N.  This  procedure  ensures  that  (3.8)  holds  with  c(N)  =  2/N. 

The  results  obtained  when  starting  from  the  true  parameter  values  are  not  reported 
here.  For  both  sample  sizes,  the  four  algorithms  gave  similar  satisfactory  estimates. 
Therefore,  when  strong  prior  information  on  the  parameters  is  available,  EM  should  be 
preferred  to  its  stochastic  variants  even  for  small  sizes,  because  of  its  simplicity. 

The  results  corresponding  to  random  initial  positions  are  displayed  in  Table  1.  In 
this  table,  the  first  row  indicates  the  algorithm  which  has  been  performed.  The  second 
row  indicates  the  sample  size.  The  row  '#’  provides  the  number  of  successful  trials  (out 
of  50).  Then,  the  rows  'pi'  -  'pa',  'mi'  -  ’ma'  and  'Oj'  -  'a^'  provide  the  average  and 

standard  deviation,  into  brackets,  of  the  estimates  of  these  parameters  over  the  '#' 
recorded  trials. 


Table  1  about  here 


These  results  highlight  the  main  practical  differences  between  EM  and  the  three 
stochastic  versions  of  EM  under  consideration. 

Since  the  samples  were  small,  the  likelihood  functions  were  littered  with  many 
local  maxima,  and  the  EM  algorithm  provided  solutions  which  greatly  depended  on  its 
initial  positions.  This  is  apparent  from  the  mean  values  of  the  first  two  component 
parameters,  which  are  far  from  the  true  values,  and  from  the  large  standard  deviations  of 
the  estimates. 
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These  simulations  illustrate  the  ixx>r  performance  of  SEM  for  small  samples  (only 
17  successful  trials  out  of  50  for  N  =  60  with  the  random  initialization);  This  is  the  reason 
why  SAEM,  which  can  be  regarded  as  a  simulated  annealing  type  version  of  SEM,  has 
been  proposed.  However,  SEM  provides  good  results  when  the  trials  can  be  achieved 
and  other  numerical  experiments  for  moderate  sample  sizes  highlight  its  ability  to  avoid 
unstable  stationary  points  of  the  likelihood  (see,  e.g.,  Celeux  and  Diebolt  1985). 

They  also  show  that  if  the  temperatures  in  SAEM  and  s.a.  MCEM  are  chosen 
such  that  the  magnitudes  of  the  perturbations  of  both  algorithms  decrease  at  similar  rates, 
then  SAEM  and  s.a.  MCEM  give  very  close  results;  furthermore,  if  the  rates  of  the 
temperatures  are  suitably  chosen,  then  both  algorithms  remain  more  stable  than  SEM  for 
small  samples  and  give  better  results  than  EM;  however,  the  major  drawback  of  s.a. 
MCEM  is  that  it  requires  more  and  more  pseudorandom  draws  as  the  iteration  number 
increases:  The  CPU  time  of  each  run  of  s.a.  MCEM  in  our  trials  was  about  25  times  the 
corresponding  time  for  SAEM.  Another  drawback,  common  to  SAEM  and  s.a.  MCEM, 
is  that  no  data-driven  determination  of  the  temperatures  is  available  (see  also  the  Monte- 
Carlo  simulations  in  Celeux  and  Diebolt,  1991a). 
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alg. 

EM 

SEM 

SAEM 

MCEM 

N 

100 

100 

100 

100 

# 

50 

28 

38 

36 

Pi 

0.28  (0.14) 

0.23  (0.05) 

0.24  (0.05) 

0.24  (0.05) 

P2 

0.26  (0.10) 

0.24  (0.03) 

0.24  (0.04) 

0.24  (0.04) 

P3 

0.20  (0.09) 

0.28  (0.05) 

0.26  (0.06) 

0.27  (0.06) 

P4 

0.25  (0.10) 

0.25  (0.05) 

0.26  (0.04) 

0.25  (0.05) 

mj 

2.34  (0.70) 

2.02  (0.05) 

2.01  (0.05) 

2.02  (0.05) 

ni2 

5.96  (2.10) 

4.99  (0.12) 

4.98  (0.12) 

4.99(0.14) 

9.80  (2.31) 

9.14  (0.26) 

9.04  (0.23) 

9.06  (0.24) 

m4 

15.00  (1.29) 

15.04  (0.57)  14.99  (0.51)  14.98  (0.61) 

2 

0.71  (1.07) 

0.07  (0.02) 

0.06  (0.02) 

0.07  (0.02) 

1 

02 

0.77  (1.55) 

0.22  (0.08) 

0.23  (0.08) 

0.28  (0.10) 

2 

<^3 

1.09(1.48) 

1.10(0.58) 

1.08  (0.76) 

1.11  (0.73) 

2 

^^4 

3.91  (3.25) 

3.29  (1.40) 

3.71  (1.67) 

3.47  (1.55) 

EM 

SEM 

SAEM 

MCEM 

60 

60 

60 

60 

50 

17 

30 

27 

0.32  (0.14) 

0.26  (0.06) 

0.25  (0.05) 

0.25  (0.05) 

0.26  (0.12) 

0.27  (0.05) 

0.25  (0.07) 

0.26  (0.05) 

0.21  (0.10) 

0.23  (0.05) 

0.26  (0.07) 

0.27  (0.07) 

0.22  (0.11) 

0.24  (0.07) 

0.25  (0.06) 

0.23  (0.07) 

2.45  (0.76) 

2.10  (0.38) 

2.01  (0.06) 

2.08  (0.06) 

6.22  (1.96) 

5.33  (1.06) 

5.00  (0.14) 

4.97  (0.13) 

10.13(2.53) 

9.37  (1.34) 

9.01  (0.29) 

9.09  (0.34) 

15.28  (1.44)  15.06  (0.70) 

15.00  (0.65)  15.02  (0.67) 

0.80  (1.22) 

0.20  (0.58) 

0.06  (0.02) 

0.07  (0.02) 

0.96  (1.88) 

0.29  (0.20) 

0.23  (0.08) 

0.26  (0.12) 

0.90  (0.96) 

0.79  (0.46) 

0.92  (0.68) 

0.88  (0.56) 

3.25  (3.00) 

3.64  (2.96) 

3.62  (2.33) 

3.78  (2.50) 

Table  I.  Number  of  successful  trials  (row  ’W),  mean  and  standard  deviation  of  the  estimates  of  the 
mixture  parameters  using  EM,  SEM,  SAEM  and  MCEM,  with  random  initializations,  for  the  sample 
sizes  N  =  100  and  60. 
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